Detecting public network attacks using signatures and fast content analysis

ABSTRACT

Network worms or viruses are a growing threat to the security of public and private networks and the individual computers that make up those networks. A content sifting method if provided that automatically generates a precise signature for a worm or virus that can then be used to significantly reduce the propagation of the worm elsewhere in the network or eradicate the worm altogether. The content sifting method is complemented by a value sampling method that increases the throughput of network traffic that can be monitored. Together, the methods track the number of times invariant strings appear in packets and the network address dispersion of those packets including variant strings. When an invariant string reaches a particular threshold of appearances and address dispersion, the string is reported as a signature for suspected worm.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under contractsANI-0137102 and 60NANB1D0118 awarded by the National Science Foundationand the National Institute of Standards and Technology, respectively.The government has certain rights in the invention.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a national stage application of and claims thebenefit of PCT/US2004/040149 filed on Dec. 1, 2004, now WO 2005/103899,which claims the benefit of priority from U.S. patent application Ser.No. 10/822,226, filed on Apr. 8, 2004. Both applications areincorporated herein by reference in their entirety.

BACKGROUND

1. Field of the Invention

The present disclosure generally relates to the field of networksecurity and more particularly relates to the prevention ofself-propagating worms and viruses through data traffic analysis.

2. Related Art

Many computers are connected to publicly-accessible networks such as theInternet. This connection has made it possible to launch large-scaleattacks of various kinds against computers connected to the Internet. Alarge-scale attack is an attack that involves several sources anddestinations, and which often (but not necessarily) involves a largetraffic footprint. Examples of such large-scale attacks may include: (a)viruses, in which a specified program is caused to run on the computer,which then attempts to spread itself to other computers known to thehost computer (e.g., those listed in the address book); and (b) denialof service attacks (DoS), in which a group of computers is exposed to somany requests that it effectively loses the ability to respond tolegitimate requests. Many viruses and worms indirectly cause DoS attacksas well for networks by sending a huge amount of traffic whilereplicating. Distributed denial of service (DDOS) occurs when anattacker uses a group of machines (sometimes known as zombies) to launcha DoS attack.

Another form of large-scale attack is called backdoor or vulnerabilityscanning. In such an attack an intruder scans for backdoors at machinesor routers. A backdoor is a method by which a previously attackedmachine can then be enlisted by future attackers to be part of futureattacks.

Spam is unsolicited network messages often sent for commercial purposes.Large-scale spam is often simply the same as (or small variants of) thespam sent to multiple recipients. Note that this definition of spamincludes both email as well as newer spam variants such as Spam SentOver Instant Messenger.

A specific form of attack is an exploit, which is a technique forattacking a computer, which then causes the intruder to take control ofthe target computer, and run the intruder's code on the attack machine.A worm is a large-scale attack formed by an exploit along withpropagation code. Worms can be highly efficacious, since they can allowthe number of infected computers to increase geometrically. The worm cando some specific damage, or alternatively can simply take up networkbandwidth and computation, or can harvest e-mail addresses or take anyother desired action.

Many current worms propagate via random probing. In the context of theInternet, each of the number of different computers has an IP address,which is a 32-bit address. The probing can simply randomly probedifferent combinations of 32-bit addresses, looking for machines thatare susceptible to the particular worm. Once the machine is infected,that machine starts running the worm code, and again begins probing theInternet. This geometrically progresses.

A very common exploit is a so-called buffer overflow. In computers,different areas of memory are used to store various pieces ofinformation. One area in memory may be associated with storinginformation received from the network: such areas are often calledbuffers. However, an adjoining area in the memory may be associated withan entirely different function. For example, a document name used foraccessing Internet content (e.g., a URL) may be stored into a URLbuffer. However, this URL buffer may be directly adjacent to protectedmemory used for program access. In a buffer overflow exploit, theattacker sends a URL that is longer than the longest possible URL thatcan be stored in the receiver buffer and so overflows the URL bufferwhich allows the attacker to store the latter portion of its false URLinto protected memory. By carefully crafting an extra long URL (or othermessage field), the attacker can overwrite the return address, and causeexecution of specified code by pointing the return address to the newlyinstalled code. This causes the computer to transfer control to what isnow the attacker code, which executes the attacker code.

The above has described one specific exploit (and hence worm) exploitingthe buffer overflow. A security patch that is intended for that exactexploit can counteract any worm of this type. However, the operatingsystem code is so complicated that literally every time one securityhole is plugged, another is noticed. Further, it often takes days for apatch to be sent by the vendor; worse, because many patches areunreliable and end users may be careless in not applying patches, it maybe days, if not months, before a patch is applied. This allows a largewindow of vulnerability during which a large number of machines aresusceptible to the corresponding exploit. Many worms have exploited thiswindow of vulnerability.

A signature is a string of bits in a packet that characterize a specificattack. For example, an attempt to execute the perl program at anattacked machine is often signaled by the string “perl.exe” in amessage/packet sent by the attacker. Thus a signature-based blockercould remove such traffic by looking for the string “perl.exe” anywherein the content of a message. The signature could, in general, includeheader patterns as well as exact bit strings, as well as bit patterns(often called regular expressions) which allow more general matches thanexact matches.

While the exact definition of the different terms above may be a matterof debate, the basic premise of these, and other attacks, is the sendingof undesired information to a publicly accessible, computer, connectedto a publicly accessible network, such as the internet.

Different ways are known to handle such attacks. One such techniqueinvolves using the signature, and looking for that signature in Internettraffic to block anything that matches that signature. A limitation ofthis technique has come from the way that such signatures are found. Thesignature is often not known until the first attacks are underway, atwhich point it is often too late to effectively stop the initial(sometimes called zero-day) attacks.

An Intrusion Detection System (IDS) may analyze network traffic patternsto attempt to detect attacks. Typically, IDS systems focus on knownattack signatures. Such intrusion detection systems, for example, may bevery effective against so-called script kiddies who download knownscripts and attempt to use them over again, at some later time.

Existing solutions to attacks each have their own limitations. Handpatching is when security patches from the operating system vendor aremanually installed. This is often too slow (takes days to bedistributed). It also requires large amounts of resources, e.g., theperson who must install the patches.

A firewall may be positioned at the entrance to a network, and reviewsthe packets coming from the public portion of the network. Somefirewalls only look at the packet headers; for example, a firewall canroute e-mail that is directed to port 25 to a corporate e-mail gateway.The firewalls may be useful, but are less helpful against disguisedpackets, e.g., those disguised by being sent to other well-knownservices.

Intrusion detection and prevention systems, and signature basedintrusion systems look for an intrusion in the network. These are oftentoo slow (because of the time required for humans to generate asignature) to be of use in a rapidly spreading, new attack.

Other systems can look for other suspicious behavior, but may not havesufficient context to realize that certain behavior accompanying a newattack is actually suspicious. For example, a common technique is tolook for scanning behavior but this is ineffective against worms andviruses that do not scan. This leads to so-called false negatives wheremore sophisticated attacks (increasingly common) are missed.

Scanning makes use of the realization that an enterprise network may beassigned a range of IP addresses, and may only use a relatively smallportion of this range for the workstations and routers in the network.Any outside attempts to connect to stations within the unused range maybe assumed to be suspicious. When multiple attempts are made to accessstations within this address space, they may increase the level ofsuspicion and make it more likely that a scan is taking place. Thistechnique has been classically used as part of the so-called networktelescope approach.

SUMMARY

A content sifting system and method is provided that automaticallygenerates a signature for a worm or virus. The signature can then beused to significantly reduce the propagation of the worm elsewhere inthe network or eradicate the worm altogether. A complementary valuesampling method and system is also provided that increases thethroughput of network traffic that can be monitored. Together, themethods and systems identify invariant strings that appear in or acrosspackets and track the number of times those invariant strings appearalong with the network address dispersion of those packets that includethe invariant strings. When an invariant string reaches a particularthreshold of appearances and address dispersion, the string is reportedas a signature for a suspected attack.

BRIEF DESCRIPTION OF THE DRAWINGS

The details of the present invention, both as to its structure andoperation, may be gleaned in part by study of the accompanying drawings,in which like reference numerals refer to like parts, and in which:

FIG. 1A is a network diagram illustrating an example network accordingto an embodiment of the invention;

FIG. 1B is a network diagram illustrating an example network accordingto an embodiment of the invention;

FIGS. 2A-C are block diagrams illustrating an example sensor unitaccording to an embodiment of the invention;

FIG. 3 is a block diagram illustrating an example packet according to anembodiment of the present invention;

FIG. 4 is a block diagram illustrating an example content prevalencetable according to an embodiment of the invention;

FIG. 5 is a block diagram illustrating an example address dispersiontable according to an embodiment of the invention;

FIG. 6 is a functional block diagram illustrating an example hashingtechnique according to an embodiment of the invention; and

FIG. 7 is a flow diagram illustrating an example process for identifyinga worm signature according to an embodiment of the invention.

DETAILED DESCRIPTION

The present description is related to U.S. patent application Ser. No.10/822,226 entitled DETECTING PUBLIC NETWORK ATTACKS USING SIGNATURESAND FAST CONTENT ANALYSIS, filed on Apr. 8, 2004, which is incorporatedherein by reference in its entirety.

Certain embodiments as disclosed herein provide for systems and methodsfor identifying an invariant string or repeated content to serve as asignature for a network attack such as a worm or virus. For example, onemethod and system as disclosed herein allows for a firewall or othersensor unit to examine packets and optimally filter those packets sothat invariant strings within or across packets are identified andtracked. When the frequency of occurrence of a particular invariantstring reaches a predetermined threshold and the number of unique sourceaddresses and unique destination addresses also reach a predeterminedthreshold, the particular invariant string is reported as a signaturefor a suspected worm. For ease of description, the example embodimentsdescribed below refer to worms and viruses. However, the describedsystems and methods also apply to other network attacks and theinvention is not limited to worms and viruses.

After reading this description it will become apparent to one skilled inthe art how to implement the invention in various alternativeembodiments and alternative applications. However, although variousembodiments of the present invention will be described herein, it isunderstood that these embodiments are presented by way of example only,and not limitation. As such, this detailed description of variousalternative embodiments should not be construed to limit the scope orbreadth of the present invention as set forth in the appended claims.

FIG. 1A is a network diagram illustrating an example network 30according to an embodiment of the invention. In the illustratedembodiment, a computer 10, sensor unit 20, and an aggregator unit 40 arepart of and are communicatively coupled via the network 30. The network30 may be a local network, a wide area network, a private network, apublic network, a wired network or wireless network, or any combinationof the above, such as the ubiquitous Internet.

Internet messages are sent in packets including headers that identifythe destination and/or function of the message. An IP header identifiesboth source and destination for the payload. A TCP header may alsoidentify destination and source port number. The port number identifiesthe service which is requested from the TCP destination in onedirection, and from the source in the reverse direction. For example,port 25 may be the port number used commonly for e-mail; port number 80is often used for FTP and the like. The port number thus identifies thespecific resources which are requested.

An intrusion is an attempt by an intruder to investigate or useresources within the network 30 based on messages over the network. Anumber of different systems are in place to detect and thwart suchattacks. It has been recognized that commonalities between the differentkinds of large-scale attacks, each of which attack a different securityhole, but each of which have something in common.

Typical recent attacks have large numbers of attackers. Typical recentattacks often increase geometrically, but in any case the number ofinfected machines increases. Attacks may also be polymorphic, that isthey change their content during each infection in order to thwartsignature based methods.

The present systems and methods describe detecting patterns in data andusing those patterns to determine the properties of a new attack.Effectively, this can detect an attack in the abstract, without actuallyknowing anything about the details of the attack. The detection ofattack can be used to generate a signature, allowing automatic detectionof the attack. Another aspect describes certain current properties whichare detected, to detect the attack.

A technique is disclosed which identifies characteristics of an abstractattack. This technique includes looking for properties in network datawhich make it likely that an attack of a new or previous type isunderway.

The present disclosure describes a number of different properties beingviewed, however it should be understood that these properties could beviewed in any order, and other properties could alternatively be viewed,and that the present disclosure only describes a number of embodimentsof different ways of finding an attack under way.

An aspect of the disclosed technique involves looking through largeamounts of data that is received by the sensor 20 as shown in FIG. 1A.One embodiment discloses a truly brute force method of looking throughthis data; and this brute force method could be usable if large amountsof resources such as memory and the like are available. Anotherembodiment describes scalable data reduction techniques, in whichpatterns in the data are determined with reduced resources, e.g.,smaller configurations of memory and processing.

The computer 10 may be any of a variety of types of computing devicessuch as a general purpose computer device. The computer 10 may be a userdevice or a server machine or any other type of computer device thatperforms a multi-purpose or dedicated service.

The sensor 20 is configured with a data storage area 22. The sensor 20may be any of a variety of types of computing devices such as a generalpurpose computer device. The sensor 20 may be a stand alone unit or itmay be integral with the computer 10 or the aggregator 40. There can bea single sensor 20 as shown or in other embodiments there can be aplurality of sensors that alone or collectively carry out the functionsor a portion of the functions of the invention. The sensor 20 receivespackets from the network 30 and analyzes the packets for indications ofan attack. If a possible attack is detected, the sensor 20 can notifythe aggregator 40, which can then take appropriate action.

Similarly, the aggregator 40 is configured with a data storage area 42and may be any of a variety of types of computing devices such as thegeneral purpose computer device. Additionally, there may be one or moreaggregators 40 that alone or collectively carry out the functions or aportion of the functions of the invention.

FIG. 1B is a network diagram illustrating an alternative example network60 according to an embodiment of the invention. In the illustratedembodiment, the computer 10 is communicatively coupled with an intrusionsystem 70 and a firewall 80 via the network 60. The computer 10 is alsoin communication with the Internet 90 via the firewall 80 or optionallythrough the intrusion system 70.

The intrusion system 70 is configured with a data storage area (notshown) and may be in communication with the firewall 80 via the network60 or optionally through a direct communication link 75. The intrusionsystem 70 is also in communication with the Internet 90 through thefirewall 80 or optionally directly through communication link 95. Theintrusion system 70 preferably carries out the same function as thepreviously described sensor 20 and may be a stand alone unit orintegrated with another device. In one embodiment, the intrusion system70 can perform the combined functions of the previously described sensor20 and the aggregator 40.

The intrusion system 70 may be any of a variety of types of computingdevices such as a general purpose computer device. There may be a singleintrusion system 70 as shown or there may be more than one that alone orcollectively carry out the functions or a portion of the functions ofthe invention. In an embodiment, the intrusion system 70 may beintegrated with the firewall 80 into a combined device 85. In such acase, the communication link 75 may take the form of shared memory orinter-process communication, as will be understood by one having skillin the art.

The firewall 80 is also configured with a data storage area (not shown)and may be any of a variety of types of computing devices such as ageneral purpose computer. Additionally, there may be one or morefirewalls that alone or collectively carry out the functions or aportion of the functions of the invention described herein.

FIG. 2A is a functional block diagram illustrating an example sensor 20according to an embodiment of the invention. In the illustratedembodiment, the sensor 20 is configured with a data storage area 22 andincludes a communication module 100, a destination checker module 110, acontent analysis module 120, and a signature module 130. The datastorage area 22 may include both internal and external data storage andinclude volatile and non-volatile memory devices. The configuration ofcomputing devices with various types of memory is well known in the artand will therefore not be discussed in detail herein.

The communication module 100 handles network communications for thesensor 20 and receives and processes packets appearing on the networkinterface (not shown). The communication module 100 may also handlecommunications with other sensors and one or more aggregators orcomputers. In one embodiment, when packets are received by thecommunication module 100, they are provided to the destination checkermodule 110, content analysis module 120, and signature module 130 forfurther processing in parallel.

The destination checker module 110 examines packets based on a specialassumption that there is known vulnerability in a destination machine.This makes the problem of detection much easier and faster. Thedestination checker module 110 analyzes the packets for knownvulnerabilities such as buffer overflows at a specific destination port.For example, a list of destinations that are susceptible to knownvulnerabilities is first consulted to check whether the destination ofthe current packet being analyzed is on the list. Such a list can bebuilt by a scan of the network prior to the arrival of any packetscontaining an attack and/or can be maintained as part of routine networkmaintenance.) If the specific destination is susceptible to a knownvulnerability, then the packet intended for that destination is parsedto determine if the packet data conforms to the vulnerability. Forexample, in a buffer overflow vulnerability for a URL, the URL field isfound and its length is checked to see if the field is over apre-specified limit. If the packet is determined to conform to a knownvulnerability, delivery of that packet can be stopped. Alternatively,the contents of the packet that exploit the vulnerability (for example,the contents of the field that would cause a buffer overflow) areforwarded as an anomalous signature, together with the destination andsource of the packet. The contents may be forwarded, for example, to anaggregator 40 as previously described with respect to FIG. 1A so that apossible attack may be identified and stopped.

Content analysis module 120 examines the content of a packet todetermine if it meets criteria that are not necessarily based on a knownvulnerability. For example, the content analysis module 120 may examinepackets in the aggregate to determine if they contain repetitivecontent. It has been found that large attacks against network resourcestypically include content that repeats an unusual number of times. Forexample, the content could be TCP or IP control messages for denial ofservice attacks. By contrast, worms and viruses have content thatcontains the code that forms the basis of the attack, and hence thatcode is often repeated as the attack propagates from computer tocomputer. Spam has repeated content that contains the information thespammer wishes to send to a large number of recipients.

Advantageously, only the frequently repeated content (signatures) arelikely to be problems. For example, a signature that repeats just oncecould not represent a large-scale attack. At most, it represents anattack against a single machine. Therefore, the frequent signatures maybe further analyzed by the content analysis module 120 to determine ifit is truly a threat, or is merely part of a more benign message.

The signature module 130 analyzes packet data to determine whatsignatures, if any, are included in the data payload. The signaturemodule 130 may examine individual packets to find signatures or it mayexamine the data within a single packet and across packets to findsignatures that extend across packet boundaries. The signature module130 may work in concert with the other modules in the sensor 20 toprovide them with information about signatures in packets.

FIG. 2B is a functional block diagram illustrating an example contentanalysis module 120 according to an embodiment of the invention. In theillustrated embodiment, the content analysis module 120 is configuredwith a data storage area 22 and includes a spreading module 122, acorrelation module 124, an executable code detection module 126, and ascanning module 128.

The spreading module 122 is configured to determine whether a large(where “large” is defined by thresholds that can be set to any desiredlevel) number of attackers or attacked machines are involved insending/receiving the same content. The content is “common,” in thesense that the same frequent signatures are being sent. During alarge-scale attack, the number of sources or destinations associatedwith the content may grow geometrically. This is in particular true forworms and viruses. For spam, the number of destinations to which thespam content is sent may be relatively large; at least for large-scalespam. For denial of service attacks, the number of sources may berelatively large. Therefore, spreading content may be an additionalfactor representing an ongoing attack.

When a frequent signature is detected, the spreading module 122investigates whether the content is exhibits characteristics ofspreading. This can be done, for example, by looking for and countingthe number of sources and destinations associated with the content.

In a brute force example, a table of all unique sources and all uniquedestinations is maintained. Each piece of content is investigated todetermine its source and its destination. For each string S, a table ofsources and a table of destinations are maintained. Each unique sourceor destination may increment respective counters. These countersmaintain a count of the number of unique sources and uniquedestinations.

When the same string S comes from the same source, the counter is notincremented. When that same string does come from a new source, the newsource is added as an additional source and the unique source counter isincremented. The destinations are counted in an analogous way. Thesource table is used to prevent over-counting the number of sources.That is, if Sally continually sends the message “hi Joe” Sally does notget counted twice.

The frequent and spreading signatures found by the spreading module 122can also be subjected to additional checks such as a check forexecutable code, spam, backdoors, scanning, and correlation. Each ofthese checks, and/or additional checks, can be carried out by modules,either software based, hardware based, or any combination thereof.

The correlation module 124 examines the source and destination ofmultiple packets to determine if an interval pattern is present. Forexample, a piece of content may be sent to a set of destinations in afirst measured interval. In a later second measured interval, the samepiece of content is sent by some fraction of these destinations actingas sources of the content. Such correlation can imply causality whereininfections sent to destinations in the first interval are followed bythese stations acting as infecting agents in the later interval.

In one embodiment, a correlation test can be used to scalably detect thecorrelation between content sent to stations in one interval, andcontent sent by these sources in the next interval. The correlation testis a likely sign of an infection. Meeting the correlation test adds tothe guilt score assigned to a piece of content.

For example, a bitmap for source addresses and a bitmap for destinationaddresses are initialized to “0” whenever a new signature is detectedand added to what may be referred to as a frequent content table. Asimilar initialization occurs at the end of every time interval to resetthe frequency. The concepts used are very similar to those describedherein for detecting spreading content when similar bitmap structurescan be used.

Thus, when a new signature is detected, the source IP address is hashedinto the source bitmap and the destination IP address is analogouslyhashed into the destination bitmap. The bit positions set in the sourcebitmap for this interval are then compared with the bit positions set inthe destination bitmap for the previous interval. If a large number ofset bits are in common, it indicates that a large number of thedestinations that received the content in the last interval are sendingthe same content in this interval. Accordingly, the correlation module124 would identify that content as passing the correlation test.

Another example correlation test is a spam test conventionally known asthe Bayesian spam test. The Bayesian test may heuristically analyze thecontent to determine if the suspected content is in fact spam accordingto the Bayesian rules.

The executable code detection module 126 detects the presence ofexecutable code segments. The presence of executable code segments mayalso be an additional (but not necessary) sign of an attack. Worms andcertain other attacks are often characterized by the presence of code(for example, code that can directly execute on Windows machines) in theattack packets they send. Therefore, in analyzing content to determinean infestation, the repeatable content is tested against parameters thatdetermine executable code segments. It is unlikely that reasonably largesegments of contiguous packet data will accidentally look likeexecutable code; this observation is the basis of special techniques fordetermining the presence of code. In one aspect, a check is made forIntel 8086 and Unicode executable code formats.

In one embodiment, the executable code detection module 126 isconfigured to test each suspicious data segment that is identified. Forexample, a data segment starting at the beginning of a packet, at anoffset, or spanning across packets can be tested for executable code.When code is detected to be over a specified length, the executable codedetection module 126 reports a positive code test, for example to thesensor or intrusion system.

A variety of different code tests can be employed by the executable codedetection module 126. For example, a particular code test can simply bea disassembler nm on the packet at each of the plurality of offsets.Most worms and the like use 8086 code segments. Therefore, an 8086disassembler can be used for this purpose.

Alternatively, a technique of looking for opcodes and associatedinformation can be used as a code test. The opcodes may be quite dense,leaving only a few codes that are clearly not 8086 codes. Each opcodemay have associated special information following the code itself. Whilea small amount of data may look like code, because of the denseness ofthe opcodes, it is quite unlikely that large strings of random data looklike codes. For example, if 90% of the opcodes are assigned, a randombyte of data has a 90% chance of being mistaken for a valid opcode;however, this is unlikely to keep happening when measured over 40 bytesof data that each of the appropriate bytes looks like a valid opcode.

This test, therefore, maintains a small table of all opcodes, and foreach valid opcode the test uses the length of the instruction to testwhether the bits are valid. In one example, the code test may start atoffset O, perform a length test, and then repeat until a length greaterthan N for opcodes tests of length N. Then each bit at offset O alongwith its length in the opcode table, is looked up. If the opcode tableindicates that the byte is invalid, the code test would fail. If theopcode table entry is valid, the length test is incremented by theopcode table entry length and the code test would continue. The systemthus checks for code at offset O by consulting the table looking for afirst opcode at O. If the opcode is invalid, then the test fails, andthe pointer moves to test the next offset. However, if the opcode isvalid, then the test skips the number of bytes indicated by theinstruction length, to find the next opcode, and the test repeats. Ifthe test has not failed after reaching N bytes from the offset O, thenthe code test has succeeded.

This test can be carried out on each string, using 8086 and unicode,since most of the attacks have been written in these formats. It shouldbe understood, however, that this may be extended to other code setswhere desirable to do so.

As previously described, the code test can be combined with the frequentcontent test or other tests to confirm whether a piece of frequentcontent contains at least one fragment of code. In an alternativeembodiment, the code detection test can be used as a threshold testprior to the other tests such as the frequent content test. In such anembodiment, only content that has a code segment of size N or more wouldbe considered for frequent content testing.

The scanning module 128 is configured to determine whether IP addressesor ports are being probed for potential vulnerability. For example, itmay be necessary for an attacker to communicate with vulnerable sourcesin order to launch an attack. Scanning may be used by the attacker orworm/virus to find valid IP addresses to probe for vulnerable services.Probing of unused addresses and/or ports can be used by the attacker tomake this determination. However it is possible that future attacks mayalso modify their propagation strategies to use pre-generated addressesinstead of probing. Accordingly, one embodiment uses scanning only as anadditional sign of an attack which is not necessary to output ananomalous signature.

In one embodiment, a scanning test is employed that, unlike conventionalscanning systems, uses both the content and the source as keys for thetest. Conventional systems tested only the source address. In thescanning test, tests are made for content that is being sent to unusedaddresses (of sources that disburse such content and send to unusedaddresses) and not solely sources. A guilt score is assigned to piecesof “bad” content, though as a side-effect, the individual stationsdisbursing the bad content may also be tagged. Notice also that theexploit in a TCP-based worm will not be sent to these addresses becausea connection cannot be initiated without an answer from the victim.

In one embodiment, the scanning module 128 looks for a range of probesto an unused space. For example, a source address may make severalattempts to communicate with an inactive address or port by mistake. Ahundred attempts to a single unused address or port is less suspiciousthan a single attempt to each of a hundred unused addresses/ports. Thusrather than counting just the number of attempts to unused addresses,the scanning module 128 may also make an estimate of the range of unusedaddresses that have been probed.

To implement these augmentations scalably, a representation of the setof the unused addresses/ports of an enterprise or campus network ismaintained by the scanning module 128. For scalability, unused addressescan be done compactly using a bitmap (for example, for a Class Bnetwork, 64K bits suffices) or a Bloom Filter (described in Fan, et al.,Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol, SIGCOMM98, 1998). The list can be dynamically validated. Initial guesses aboutwhich address spaces are being used can be supplied by a manager. Thiscan easily be dynamically corrected. For example, whenever an address Sthought to be unassigned sends a packet from the inside, that addressshould be updated to be an assigned address. Note that in the specialcase of a contiguous address space, a simple network mask suffices.

A scalable list of unused ports can be kept by keeping an array with onecounter for each port, where each array entry is a counter. The counteris incremented for every TCP SYN sent or each RESET sent, anddecremented for every TCP FIN or FIN-ACK sent. Thus, if a TCP-basedattack occurs to a port and many of the machines it contacts are notusing this port, TCP FINs will not be sent back by these machines, orthey will send TCP resets. Thus, the counter for that port willincrease. Some care must be taken in implementing this technique tohandle spoofing and asymmetrical routing, but even the simplest instanceof this method will work well for most organizations.

A “blacklist” of sources that have sent packets to the unused addressesor ports in the last k measurement periods. This can be done compactlyvia a Bloom Filter or a bitmap. A hashed bit map can also be maintained,(similar to counting sources above) of the inactive destinations probed,and the ports for which scanning activity is indicated.

For each piece of frequent content, the mechanism keeps track of therange of sources in the blacklisted list associated with the content.Once again, this can be done scalably using a hashed bitmap as describedherein. In one embodiment, testing for content of scanning can beimplemented by hashing the source address of a suspicious signature Sinto a position within the bit map. When the number of bits set withinthat suspicion bit map exceeds a threshold, then the scanning isreported as true.

Note that while worms may evince themselves by the presence ofreasonably large code fragments, other attacks such as DistributedDenial of Service may be based on other characteristics such as largeamounts of repetition, large number of sources, and the reception of anunusually large number of TCP reset messages. The content analysismodule 120 may identify spam, for example, as being characterized byrepetitive presence of keywords identified based on heuristic criteria.These additional checks for spreading, correlation, executable code,scanning, and spam can be optional such that one or more or none ofthese tests may be used.

FIG. 2C is a functional block diagram illustrating an example signaturemodule 130 according to an embodiment of the invention. In theillustrated embodiment, the signature module 130 is configured with adata storage area 22 and includes a parser module 132, a filter module134, a key module 136, and a data module 138. The signature module 130is configured to examine packets for signatures that appear within asingle packet or are spread across packets. The signature module 130preferably works in connection with the other modules of the sensor 20to detect a possible attack.

In an embodiment, the signature module 130 can perform a brute forceexamination of each packet that is received. It should be understood,however, that the brute force method of analyzing content could requireincredible amounts of data storage. For example, commonly used intrusionsystems/sensors that operate at 1 Gigabit per second, easily produceterabytes of packet content over a period of a few hours. Accordingly, ageneral data reduction technique may be used. It should be understood,however, that other detection techniques may be used without a generaldata reduction technique. Thus, in an embodiment, a data reductiontechnique can advantageously be used as part of those detectiontechniques that generate large amounts of data, such as signatures andsource/destination addresses and ports.

In one aspect, a signature for a possible attack (also referred to as an“anomalous signature”) may be established when any frequent content isfound that also meets an additional test such as spreading, correlation,executable code segments, or any other test. According to anotheraspect, the signatures may be scored based on the amount of indicia theyinclude. In any case, this information is used to form anomaloussignatures that may then be used to block operations or may be sent to abank of signature blockers and managers such as the aggregator 40previously described with respect to FIG. 1A.

In addition to the signature, if a packet signature is deemed to beanomalous according to the tests above, the destination and source ofthe packet may be stored. This can be useful, for example, to trackwhich machines in a network have been attacked, and which ones have beeninfected.

An intrusion detection system (or sensor) device may also (in additionto passing the signature, source, destination, or other information)take control actions by itself. Standard control actions that are wellknown in the state of the art include connection termination (where theTCP connection containing the suspicious signature is terminated),connection rate limiting (where the TCP connection is not terminated butslowed down to reduce the speed of the attack), and packet dropping(where any packet containing the suspicious content is dropped with acertain probability). Note that when an attack is based on a knownvulnerability, packet dropping with probability 1 can potentiallycompletely prevent an attack from coming into a network or organization.

The signature module 130 is configured to identify a signature S fromwithin any subset of the packet data payload and/or header. In general,a signature can be any subset of the data payload. A signature can alsobe formed from any portion of the data payload added to or appended toinformation from the packet header, for example, the TCP destination (orsource) port. This type of signature recognizes that many attacks targetspecific destination (or in some limited cases, source) ports. An offsetsignature is based on the recognition that modern large-scale attacksmay become polymorphic—that is, may modify the content on individualattack attempts. This is done to make each attack attempt look like adifferent piece of content. Complete content change is unlikely,however. Some viruses add small changes, while others encrypt the virusbut add a decryption routine to the virus. Each contains some commonpiece of content; in the encryption example, the decryption routinewould be the common piece of content.

The attack content may lie buried within the packet content and may berepeated, but other packet headers may change from attack to attack.Thus, according to another embodiment, the signature is formed by anycontinuous portion in the data payload, appended to the TCP destinationport. Therefore, the signature module 130 investigates for contentrepetition strings anywhere within the TCP payload. For example, thetext “hi Joe” may occur within packet 1 at offset 100 in a firstmessage, and the same text “hi Joe” may occur in packet 2 at offset 200.This signature module 130 allows for counting these as two occurrencesof the same string despite the different offsets in each instance.

The evaluation of this occurrence is carried out by evaluating allpossible substrings in the packet of any certain length. A value of asubstring length can be chosen, for example, 40 bytes. Then, a datapayload each piece of data coming in may be windowed, to first look forbytes 1 through 40, then look for bytes 2 through 41, then look forbytes 3 through 42. All possible offsets are evaluated.

Determining the length of substrings that are evaluated is a trade-offdepending on the desired amount of processing. Longer substrings willtypically have fewer false positives, since it is unlikely that randomlyselected substrings can create repetitions of a larger size. On theother hand, shorter substrings may make it more difficult for anintruder to evade attacks.

Certain attacks may chop the attack into portions which are separated byrandom filler. However, these separated portions will still be found asseveral invariant content substrings within the same packet. In such anattack, a multi-signature may be identified by the signature module 130.A multi-signature may comprise one or more continuous portions ofpayload combined with information from the packet header such as thedestination port.

Other attacks may break the attack into portions that are separatedacross two or more packets. In such an attack, when the packets arereceived and placed in order, the data payloads can be examined suchthat predetermined sized strings that span adjacent packets are analyzedfor invariant content substrings that cross packet boundaries. Thus, aninter-packet signature may be identified that comprises a portion ofpayload from a first packet with a portion of payload from a secondpacket. Furthermore, the two source packets for the inter-packetsignature are preferably adjacent when reordered.

The parser module 132 receives packets and parses the header and datapayload from the packet. The parser module 1320 additionally extractsinformation from the packet header such as the protocol, the source IPaddress, the destination IP address, the source IP (or UDP) port, andthe destination IP (or UDP) port just to name a few. The parser module132 also breaks down the data payload into predetermined sized stringsfor further processing by other components of the sensor. As described,the predetermined sized strings may extend across packet boundaries suchthat a single predetermined sized string may have a portion of itscontent from a first packet and a portion of its content from a second,adjacent packet.

The filter module 134 may be implemented in hardware as a series ofparallel processors or application specific integrated circuits.Alternatively the filter module 134 may be implemented in software thatincludes one or more routines. Advantageously, the software may bethreaded so that the filtering process implemented in software is also aparallel process to the extent allowed by the associated hardware onwhich the software is running. The function of the filter 134 is tooptimally reduce the number of predetermined sized strings that areprocessed while maintaining high efficacy for virus detection, asdescribed later in detail with respect to FIG. 6.

The key manager 136 identifies the invariant strings from the datapayload that may qualify as a signature, for example, due to theirrepetitive nature, inclusion of code segments, matching a predeterminedstring, etc. The key manager 136 may combine information from the packetheader with an identified string of content from the packet data payloadto create a key. Each key is possibly a worm or virus signature.Alternatively, the key manager 136 may create a key from the string ofcontent alone or from the string of content in combination with otherinformation selected from the packet header such as the destination IPaddress or the destination port. In an embodiment, the key manager 136performs data reduction on the key to minimize the size of the key.

In one embodiment, a data reduction technique called hashing may beemployed. Hashing is a set of techniques to convert a long string ornumber into a smaller number. A simple hashing technique is often tosimply remove all but the last three digits of a large number. Since thelast three digits of the number are effectively random, it is an easyway to characterize something that is referred by a long number. Forexample, U.S. Pat. No. 6,398,311 can be described simply the 311 patent.However, much more complex and sophisticated forms of hashing are known.

In one example, assume the number 158711, and that this number must beassigned to one of 10 different hashed “bins” by hashing the number toone of 10 bins. One hashing technique simply adds the digits 1+5+8+7+1+1equals 23. The number 23 is still bigger than the desired number of 10.Therefore, another reduction technique is carried out by dividing thefinal number by 10, and taking the remainder (“modulo 10”). Theremainder of 23 divided by 10 is 3. Therefore, in 158711 is assigned tobin 3. In this technique, the specific hash function is: (1) add all thedigits; and (2) take the remainder when divided by 10.

The same hash function can be used to convert any string into a numberbetween 0 and 9. Different numbers can be used to find different hashes.

The hash function is repeatable, that is, any time the hash functionreceives the number 158711, it will always hash to bin 3. However, othernumbers will also hash to bin 3. Any undesired string in the same bin asa desired string is called a hash collision.

Many other hash functions are known, and can be used. These includeCyclic Redundancy Checks (CRCs) commonly used for detecting errors inpacket data in networks, a hash function based on computing multiples ofthe data after division by a pre-specified modulus, the so-calledCarter-Wegman universal hash functions (the simplest instantiation ofwhich is to multiply the bit string by a suitably chosen matrix ofbits), hash functions such as Rabin hash functions based on polynomialevaluation, and one-way hash functions such as MD-5 used in security.This list is not exhaustive and it will be understood that other hashfunctions and other data reduction techniques can be used.

A data reduction technique that is advantageous to use with the datapayload subsections 230 described with respect to FIG. 3 allows adding apart of the hash and removing a part when moving between two adjacentsubsections. One aspect of this embodiment, therefore, may use anincremental hash function. Incremental hash functions make it easy tocompute the hash of the next substring based on the hash of the previoussubstring. One classic incremental hash function is a Rabin hashfunction (used previously by Manber in spotting similarities in filesinstead of other non-incremental hashes (e.g, SHA, MD5, CRC32)).

A large data payload may contain thousands of bytes. Accordingly, tominimize the probability of hash collisions (where different sourcestrings result in the same value after hashing) the data reduction maybe, for example, a hash to 64 bits.

The string S that is hashed may also include information about thedestination port. The destination port generally remains the same for aworm, and may distinguish frequent email content from frequent Webcontent or peer-to-peer traffic in which the destination port changes.

In an embodiment, use of the Rabin hash function (also called the Rabinfingerprint) advantageously simplifies the analysis of data acrosspackets. In an embodiment, the last 40 byte subsection of the datapayload of a packet is stored after the packet processing is complete.The Rabin fingerprint for that subsection is also stored. When the nextdata payload is analyzed, the Rabin fingerprint is computed for the 40byte subsection that includes the last 39 bytes of the previous packetand the first byte from the new packet. In this fashion, the packets maybe examined and analyzed as a continuous stream of data—across packetboundaries. This allows the detection of an attack that spreadsinvariant strings across packets.

After a signature or key is created, the data manager 138 processes thesignature. In an embodiment, the signature is subjected to a frequentsignature test. Each key can be stored in a database. For example, thedata manager 138 may maintain a content prevalence table and an addressdispersion table (described later with respect to FIGS. 4 and 5,respectively). The content prevalence table includes entries for keysand the number of times the particular key has been encountered(“count”). If a newly generated key is not present in the addressdispersion table, the key is placed in the content prevalence table fortracking of the number of times the key is encountered. When the countfor a particular key in the content prevalence table exceeds apredetermined threshold, the data manager 138 moves the key into theaddress dispersion table. In an embodiment, the content prevalence andaddress dispersion tables may be periodically flushed or specificentries may individually expire after a predetermined time period.

FIG. 3 is a block diagram illustrating an example packet 200 accordingto an embodiment of the present invention. In the illustratedembodiment, the packet 200 comprises a header 210 and a data payload220. The header 210 typically includes information relevant to thepacket 200 such as the protocol by which the packet should be processed,the source IP address, the source IP port, the destination IP address,and the destination IP port. Other information may also beadvantageously located in the header 210.

The data payload 220 can be very large and is preferably divided up intosmaller more manageable sized chunks, for example by the aforementionedparser. These more manageable sized chunks are shown as payloadsubsections 230. The size of a payload subsection can vary and ispreferably optimized based on the processing power of the sensor 20,available memory 22, and other performance or result orientedparameters. In one embodiment, the size of a payload subsection 230 is40 bytes.

Alternatively, the data payload subsections can be all of the contiguousstrings in the data payload of any length. Or the subsections may be allof the contiguous strings in the data payload with the same length.Other possible combinations of data payload subsections may also beemployed as will be understood by those skilled in the art. In apreferred embodiment, each subsection is 40 bytes, with the firstsubsection comprising bytes 1-40; the second subsection comprising bytes2-41; the third subsection comprising bytes 3-42; and so on until eachbyte in the data payload is included in at least one subsection.

FIG. 4 is a block diagram illustrating an example content prevalencetable 250 according to an embodiment of the invention. In theillustrated embodiment, each row of the content prevalence table 250includes a key and a count. For example, the count may represent thenumber of times the specific key has been encountered. As previouslydescribed, the key may be a string from the data payload of a packet andmay also include the protocol and/or destination port information fromthe packet header. Alternatively, the key may be a representation of thestring from the data payload (or the string combined with headerinformation) after a data reduction has been performed.

In an embodiment, the data manager 138 (previously described withrespect to FIG. 2C) may maintain the content prevalence table 250. Forexample, when an new key is identified, the key is looked up in thecontent prevalence table 250. If the key is not in the table, it isadded to the table along with a count of 0. Alternatively, if the key isalready in the table, then the count associated with the key isincremented.

Additionally, a frequency threshold can also defined. Thus, if the countfor a particular key exceeds the frequency threshold, then the key isidentified as a frequent or repetitive key. In an alternativeembodiment, a time threshold may also be defined for each entry in thecontent prevalence table 250. Accordingly, when the time threshold isreached for a particular entry, the counter can be reset so that thefrequent content test effectively requires the key to be identified acertain number of times during a specified time period.

FIG. 5 is a block diagram illustrating an example address dispersiontable 270 according to an embodiment of the invention. In theillustrated embodiment, each row of the address dispersion table 270includes a key and a count of the unique source IP addresses and a countof the unique destination IP addresses associated with the key. When aparticular key in the content prevalence table 250 is identified asbeing a frequent or repetitive key, the data manager preferably createsan entry in the address dispersion table 270 for that key.Alternatively, when the key manager identifies a key that already existsin the address dispersion table 270, the relative counts for the uniquesource IP address and the unique destination IP address is updated ifnecessary.

Because the tables illustrated in FIGS. 4 and 5 may become quite huge inpractice, data reduction techniques may be performed to manage thecontent prevalence and the address dispersion tables. For example, adata reduction hash may be performed on one or both of the tables 250and 270.

In an embodiment, an optional front end test such as a Bloom Filter(described in Burton Bloom: Space/Time Tradeoffs In Hash Coding WithAllowable Errors; Communications ACM, 1970) or a counting Bloom Filter(described in Fan, et al., Summary Cache: A Scalable Wide-Area Web CacheSharing Protocol, SIGCOMM 98, 1998) to sieve out content that isrepeated only a small number of times.

FIG. 6 is a functional block diagram illustrating an example hashingtechnique according to an embodiment of the invention. In general, for amore scalable storage of content in the content prevalence and/oraddress dispersion tables, a certain number k of hash stages areestablished. Each stage I hashes the value S using a specified hashfunction Hash(I), where Hash(I) is a different hash function for eachstage I. For each of those stages, a specific position, k(I) is obtainedfrom the hashing. The counter in position k(I) is incremented in each ofthe k stages. Then, the next I is established. Again, there are kstages, where k is often at least three, but could even be 1 or 2 insome instances.

The data reduction hashing system checks to see if all of the k stagecounters that are incremented by the hash for a specific string S aregreater than a stage frequency threshold. S is added to the frequentcontent table only when all of the k counters are all greater than thethreshold.

Specifically with respect to FIG. 6, when k=3, the data reductiontechnique would be called a 3 stage hash. Each stage is a table ofcounters which is indexed by the associated hash function (Hash(I)) thatis computed based on the packet content. At the beginning of eachmeasurement interval, all counters in each stage are initialized to 0.Each packet comes in (e.g., Packet S) and is hashed by a hash functionassociated with the stage. The result of the hash is used to set acounter in that stage.

For example, the packet S is hashed by a stage 1 hash function. Thisproduces a result of 2, shown incrementing counter 2 in the stage 1counter array. The same packet S is also hashed using the stage 2 hashfunction, which results in an increment to counter 0 in the stage 2counter array. Similarly, the packet S is hashed by the stage 3 hashfunction, which increments counter 6 of the stage 3 counter array. Inthis example, the same packet hashes to three different sections (ingeneral, though there is a small probability that these sections maycoincide) in the three different counter stages.

After the hashing, the stage detector 290 determines if the countersthat have currently been incremented are each above the frequencythreshold. The signature is added to the frequent content memory 295only when all of the stages have been incremented above the stagefrequency threshold.

As examples, the stage 1 hash function could sum digits and take theremainder when divided by 13. The stage 2 hash function could sum digitsand take the remainder when divided by 37 and the stage 3 hash functioncould be a third independent function. In an embodiment, parameterizedhash functions may be used, with different parameters for the differentstages, to produce different but independent instances of the hashfunction.

The use of multiple hash stages with independent hash functions reducesthe problems caused by multiple hash collisions. Moreover, the system isentirely scalable. By simply adding another stage, the effect of hashcollisions is geometrically reduced. Moreover, since the memory accessescan be performed in parallel, this can form a very efficient,multithreaded software or hardware implementation.

Advantageously, the bits in the individual stage counter arrays can beweighted by the probability of hash collisions, in order to get a moreaccurate count. When counting source and destination IP address, theweighting provides a more accurate count of the number of unique sourcesand destinations. Additionally, when applied to counting IP addresses,this technique effectively creates and stores a bitmap, where each bitrepresents an IP address. Advantageously, the storage requirements aresignificantly reduced, rather than storing the entire 32-bit IP addressin an address table.

While the bitmap solution is better than storing complete addresses, itstill may require keeping hundreds of thousands of bits per frequentcontent. Another solution carries out even further data compression byusing a threshold T which defines a large value. For example, defining Tas 100, this system only detects values that are large in terms ofsource addresses. Therefore, no table entries are necessary until morethan 100 source addresses are found.

It also may be desirable to know not only the number of sourceaddresses, but also the rate of increase of the source addresses. Forexample, it may be desirable to know that even though a trigger after100 sources is made, that in the next second there are 200 sources, inthe second after that there are 400 sources, and the like.

In an embodiment, even more scaling is achievable to advantageously useonly a small portion of the entire bit map space. For example, if anidentified signature is a frequent signature the IP address is hashed toa W bit number S_(HASH). Only certain bits of that hash are selected,e.g. the low order r bits. That is, this system scales down the count toonly sampling a small portion of the bitmap space. However, the samescaling is used to estimate the complete bitmap space. The same scalingdown operations are also carried out on the destination address.

For example, an array of 32-bits (i.e., r=32) may be maintained, wherethe threshold T is 96. Each source address of the content is hashed to aposition between 1 and 96. If the position is between 1 and 32, then itis set. If the position is beyond 32, then it is ignored, since there isno portion in the array for that bit.

At the end of a time interval, the number of bits set into the 32-bitarray is counted, and corrected for collisions. The value is scaled upbased on the number of bits which were ignored. Thus, for any value ofT, the number of bits set within the available portion of the registersis counted, and scaled by a factor of T. For example, in the previousexample, if we had hashed from 1 to 96 but only stored 1 through 32, thefinal estimate would be scaled up by a factor of 3.

This technique may also be used to count a rising infection over severalintervals, by changing the scaling factor. For example, a differentscaling factor is stored along with the array in each interval. Thistechnique can, therefore, reliably count from a very small to a verylarge number of source addresses with only a very small number of bits,and can also track rising infection levels.

Accordingly, the address is hashed and a scale factor for sourceaddresses is assigned to a variable, e.g., SourceScale. If the highorder bits of the hash from positions r+1 to r+SourceScale are all zero,the low order r bits are used to set the corresponding position in thesource bit map. For example, if SourceScale is initially 3 and r is 32,essentially all but the low order 35 bits of the hash are ignored andthe low order 32 bits of the 35 bits are focused on, a scaling of2^(35−32)=2^3=8.

When the time interval ends, the counter is cleared, and the variable(SourceScale) is incremented by some amount. If, in the next intervalthe scale factor goes up to 4, the scaling focuses on the top 36 bits ofthe hash, giving a scaling of 2^4=16. Thus by incrementing the variable(SourceScale) by 1, the amount of source addresses that can be countedis doubled. Thus when comparing against the threshold for sourceaddresses, the number of bits in the hash is scaled by a factor of2^(SourceScale−1) before being compared to the threshold. This sametechnique can also be used for destination IP addresses.

FIG. 7 is a flow diagram illustrating an example process for identifyinga worm signature according to an embodiment of the invention. At a highlevel, a network attack can be detected by receiving a plurality ofpackets on a network and analyzing the data payloads of those packets todetect common content among the packets. Data reduction techniques mayalso be employed to optimize the high level process. For example,initially, in step 300, the sensor receives a packet. For example, thepreviously described communication module may receive the packet. Uponreceipt, the packet is parsed in step 310 and header information isextracted and the data payload is divided up into a plurality of strings(subsections) as shown in step 320. In an embodiment, the parsingfunction may be carried out by the previously described parser module.In a brute force method, the data payload may be divided up into theuniverse of all possible strings of one or more characters that arepresent in the data payload. Such an operation, however, iscomputationally expensive.

Alternatively, the data payload may be divided up into the universe ofall strings having a minimum length. While this further reduces thenumber of strings relative to the minimum length, the operation remainscomputationally expensive.

In an embodiment, the data payload may be divided up into the universeof all strings of a specific length. This operation significantlyreduces the number of strings created by the parsing of the data payloadwithout compromise because each string created is representative of alllonger strings including the baseline string. Advantageously, thespecific string length can be optimized for detecting invariant stringsin viruses and worms.

Additionally, if a specific length subsection is employed, then aportion of the data from the previous packet and a portion of the datafrom the current packet can be combined to create specific lengthsubsections that span the packet boundary, as illustrated in step 330.For example, if the specific length was 40 bytes, then the last 39 bytesof the data payload of the previous packet and the first byte from thedata payload of the current packet can be combined to create a singlesubsection.

Once the data payload has been parsed into subsections, and combinedwith portions from an adjacent packet, the subsections are thenfiltered, as shown in step 340. In one embodiment, the filteringfunction may be carried out by the previously described filter module.The filtering may be carried out in a series of multi-stage hardwarecomponents or it may be carried out in software. The function of thefiltering is to reduce the number of data payload subsections thatrequire processing. In an embodiment, the Rabin fingerprint iscalculated for each subsection and then only those subsections meeting apredetermined criteria are processed further. For example, after theRabin fingerprint is calculated, each subsection that ends with six (6)zeroes is processed further. This may have the effect of thinning thenumber of subsections requiring processing to a fraction of the originalnumber, for example to 1/64^(th) of the original number. Furthermore,because Rabin fingerprinting is randomly distributed, the creator of aworm or virus cannot know which subsections will be selected for furtherprocessing.

In one example, if the specific string length is 40 bytes and thethinning ratio is 1/64, the probability of tracking a worm with asignature of 100 bytes is 55%. However, the probability of tracking aworm with a signature of 200 bytes is 92% and the probability oftracking a worm with a signature of 400 bytes is 99.64%. Notably, allknown worms today have had invariant content (i.e., a signature) of atleast 400 bytes.

After the data payload subsections have been filtered, processing step350 can be undertaken to determine if all of the subsections have beenprocessed. If they have, then the process loops back to receive the nextpacket in step 330. If all of the subsections have not been processed,then in step 360 a key is created for each data subsection. In oneembodiment, the key creation may be carried out by the previouslydescribed key module. The key preferably includes the protocolidentifier, the destination port, and the Rabin fingerprint for the datasubsection. Alternatively, the key can include the Rabin fingerprintalone, or the Rabin fingerprint and the protocol, or the Rabinfingerprint and any combination of other identifying information. Oncethe key is created, the address dispersion table is consulted in step370 to see if the key exists in the table. If the key does not exist inthe address dispersion table, then the content prevalence table isupdated in step 380 accordingly and the count for the key is initializedif the entry is new or incremented if the key was already present in thecontent prevalence table. If the count is incremented and the new countexceeds the predetermined threshold number as determined in step 390,then an entry for the key is created in the address dispersion table asillustrated in step 400. If the count does not exceed the threshold (orafter the address dispersion table entry is created), the processreturns to step 350 to determine if all subsections have been processed.

After an entry is created in the address dispersion table in step 400,then the entry is updated to fill in the necessary information.Additionally, back in step 340, if it is determined that an addressdispersion table entry exists for the particular key, then the entry isupdated. For example, the count for the source IP address and the countfor the destination IP address can be updated in step 410 if thoseaddresses are unique and have not yet been associated with theparticular key.

After the address counts are updated, it is determined in step 420 ifthe new counts exceed a predetermined threshold. If they do, then instep 430 the key is reported (e.g., to the aggregator) as a possiblesignature for a suspected worm. Additionally, the packet (or packets ifthe data for the key came from two adjacent packets) containing the keyare also reported. If the counts do not exceed the threshold (or afterthe report has been made), then the process returns to the step 350 todetermine if all subsections have been processed.

Those of skill will further appreciate that the various illustrativelogical blocks, modules, circuits, and algorithm steps described inconnection with the embodiments disclosed herein can often beimplemented as electronic hardware, computer software, or combinationsof both. To clearly illustrate this interchangeability of hardware andsoftware, various illustrative components, blocks, modules, circuits,and steps have been described above generally in terms of theirfunctionality. Whether such functionality is implemented as hardware orsoftware depends upon the particular application and design constraintsimposed on the overall system. Skilled persons can implement thedescribed functionality in varying ways for each particular application,but such implementation decisions should not be interpreted as causing adeparture from the scope of the invention. In addition, the grouping offunctions within a module, block, circuit or step is for ease ofdescription. Specific functions or steps can be moved from one module,block or circuit without departing from the invention.

The various illustrative logical blocks, modules, and circuits describedin connection with the embodiments disclosed herein can be implementedor performed with a general purpose processor, a digital signalprocessor (DSP), an application specific integrated circuit (ASIC), afield programmable gate array (FPGA) or other programmable logic device,discrete gate or transistor logic, discrete hardware components, or anycombination thereof designed to perform the functions described herein.A general-purpose processor can be a microprocessor, but in thealternative, the processor can be any processor, controller,microcontroller, or state machine. A processor can also be implementedas a combination of computing devices, for example, a combination of aDSP and a microprocessor, a plurality of microprocessors, one or moremicroprocessors in conjunction with a DSP core, or any other suchconfiguration.

The steps of a method or technique described in connection with theembodiments disclosed herein can be embodied directly in hardware, in asoftware module executed by a processor, or in a combination of the two.A software module can reside in RAM memory, flash memory, ROM memory,EPROM memory, EEPROM memory, registers, hard disk, a removable disk, aCD-ROM, or any other form of storage medium. An exemplary storage mediumcan be coupled to the processor such the processor can read informationfrom, and write information to, the storage medium. In the alternative,the storage medium can be integral to the processor. The processor andthe storage medium can reside in an ASIC.

The above description of the disclosed embodiments is provided to enableany person skilled in the art to make or use the invention. Variousmodifications to these embodiments will be readily apparent to thoseskilled in the art, and the generic principles defined herein can beapplied to other embodiments without departing from the spirit or scopeof the invention. Thus, the invention is not intended to be limited tothe embodiments shown herein but is to be accorded the widest scopeconsistent with the principles and novel features disclosed herein.

1. A system, comprising at least one hardware module, for detecting a network attack, comprising: a communication module configured to receive a plurality of packets on a network; and a signature module configured to receive said plurality of packets from the communication module and analyze the content of said packets to detect common content among said packets to identify a network attack; and a content analysis module configured to analyze the common content of said plurality of packets, including criteria not based on a known vulnerability, to identify a network attack; wherein the content analysis module comprises a correlation module configured to determine whether packets sent in a first interval to a destination address are sent from said destination address in a second interval.
 2. The system of claim 1, wherein the signature module further comprises: a parser module configured to parse packets into a plurality of portions; a key module configured to perform a data reduction on each of said plurality of portions to create a plurality of data reduced portions; and a data module configured to store said data reduced portions in a data storage area, wherein the reduced data portions in the plurality of data reduced portions have a smaller size and a constant predetermined relation with portions into which the packets are parsed and at least some of the portions into which the packets are parsed that differ are reduced to the same reduced data portion.
 3. The system of claim 2, further comprising a filter module configured to reduce the number of said plurality of data reduced portions prior to storage.
 4. The system of claim 1, wherein the content analysis module further comprises a code detection module configured to identify the presence of executable code in a packet.
 5. A computer implemented method for analyzing network activity, comprising: receiving, at a computer, a plurality of packets transiting a network; analyzing, at the computer, the content of said plurality of packets to detect common content among said packets; identifying, at the computer, network attacks based upon said analysis for common content, wherein criteria for the analysis of the common content of said plurality of packets includes criteria not based on a known vulnerability; and comparing the destinations of said plurality of packets with destinations having known vulnerabilities.
 6. The method of claim 5 wherein analyzing the content further comprises performing a data reduction on at least a portion of each of the packets of said plurality of packets to form a plurality of data reduced packets, wherein the reduced data packets in the plurality of data reduced packets have a smaller size and a constant predetermined relation with the packets transiting the network and at least some of the packets transiting the network that differ are reduced to the same reduced data packet.
 7. The method of claim 6 wherein said data reduction comprises performing a hash function on at least a portion of each of the packets.
 8. The method of claim 7 wherein said hash function is an incremental hash function.
 9. The method of claim 6, wherein analyzing further comprises analyzing a subset of the data reduced portions, the subset identified by a common characteristic of each of the data reduced portions.
 10. The method of claim 6, wherein analyzing further comprises analyzing a subset of the data reduced portions, the subset identified by a common characteristic of each of the data reduced portions.
 11. The method of claim 10, wherein the common characteristic is the reduced data portion being equal to a value in a set of predefined values.
 12. The method of claim 5 wherein identifying further comprises identifying one or more signatures of an attack.
 13. The method of claim 5, wherein analyzing the content includes analyzing the content of the payloads in the plurality of packets.
 14. The method of claim 5 further comprising determining whether there is an increasing number of sources and destinations of packets having common content.
 15. The method of claim 5 further comprising analyzing the content of said plurality of packets for the presence of a specified type of code.
 16. The method of claim 5 further comprising forming a plurality of portions from each of said plurality of packets, each portion comprising a specified subset of a packet.
 17. A computer implemented method for analyzing network activity comprising: obtaining a plurality of packets being transmitted across a network; performing a data reduction on at least a portion of each of the packets of said plurality of packets to form a plurality of data reduced packets, wherein the reduced data packets in the plurality of data reduced packets have a smaller size and a constant predetermined relation with the packets being transmitted across the network and at least some of the packets being transmitted across the network that differ are reduced to the same reduced data packet; detecting repetition of content among said plurality of data packets based on the reduced data packets; comparing destinations of said plurality of packets with destinations having known vulnerabilities; and identifying network attacks based upon said detection of repetition and said comparison of destinations.
 18. The method of claim 17 wherein the destinations of said plurality of packets are only analyzed for packets wherein a repetition of content has been detected.
 19. The method of claim 17, further comprising analyzing the content of the payloads in the plurality of packets.
 20. The method of claim 19 further comprising analyzing the content of said plurality of packets for the presence of a specified type of code.
 21. The method of claim 20 further comprising: maintaining a first list of addresses; forming a second list of sources that have sent to addresses on said first list; and comparing a current source of common content to said second list.
 22. The method of claim 17 wherein said data reduction comprises carrying out a hash function on at least a portion of each of the packets.
 23. The method of claim 17 wherein said detecting common content comprises using at least first, second and third data reduction techniques on at least a portion of each of the packets, to obtain at least first, second and third results, and to count said first, second and third results, and to detect repetition when all of said at least first second and third results have a frequency of occurrence greater than a specified amount.
 24. The method of claim 17 further comprising: monitoring a first content sent to a destination; monitoring a second content sent by said destination; and determining whether there is a correlation between said first content and said second content.
 25. The method of claim 17 further comprising forming a plurality of portions from each of said plurality of packets, each portion comprising a specified subset of a packet.
 26. The method of claim 25, wherein: forming a plurality of portions further comprises identifying all portions comprising a specified subset of a packet, and forming a plurality of portions from a subset of said all portions; each portion in the subset has a common characteristic.
 27. The method of claim 26, wherein the specified subset is a segment of data from the data payload having a predetermined byte size.
 28. The method of claim 25, wherein: a first portion of the plurality is from two or more adjacent packets; and the first portion is obtained by storing a minimum length subsection comprising data from the two or more adjacent packets.
 29. A method of analyzing network activity comprising: obtaining, at a computer, a plurality of packets transiting a network; performing, at a computer, a data reduction on at least a portion of each of the packets of said plurality of packets to form a plurality of data reduced packets, wherein the reduced data packets in the plurality of data reduced packets have a smaller size and a constant predetermined relation with the packets transiting the network and at least some of the packets transiting the network that differ are reduced to the same reduced data packet; analyzing, at a computer, said plurality of data reduced packets to detect a repetition of at least a portion of content among said plurality of data packets based on criteria not necessarily based on a known vulnerability; comparing the destinations of said plurality of packets with destinations having known vulnerabilities; and analyzing, at a computer, said packets having repetitive content to determine if said packets are spreading.
 30. The method of claim 29, wherein analyzing for spreading comprises determining whether there is an increasing number of sources and destinations of packets having common content.
 31. The method of claim 29 wherein analyzing spreading comprises: monitoring a first content sent to a destination; monitoring a second content sent by said destination; and determining a whether there is a correlation between said first content and said second content.
 32. The method of claim 31 further comprising analyzing the content of said plurality of packets for the presence of a specified type of code.
 33. The method of claim 29 further comprising determining a signature of an attack based upon said analyzing of said plurality of data reduced packets and said analyzing spreading.
 34. The method of claim 29, wherein analyzing of said plurality of data reduced packets includes analyzing the content of the payloads in the plurality of packets.
 35. The method of claim 29 wherein said data reduction comprises carrying out a hash function on at least a portion of each of the packets.
 36. The method of claim 29 further comprising using at least first, second and third data reduction techniques on at least a portion of each of the packets, to obtain at least first, second and third results, and to count said first, second and third results, and to establish frequently occurring sections when all of said at least first second and third results have a frequency of occurrence greater than a specified amount.
 37. The method of claim 29 further comprising forming a plurality of portions from each of said plurality of packets, each portion comprising a specified subset of a packet.
 38. The method of claim 37, wherein a first portion of the plurality is from at least two packets.
 39. The method of claim 37, wherein analyzing further comprises analyzing a subset of the plurality of portions, the subset identified by a common characteristic of each of said portions.
 40. The method of claim 39, wherein the specified subset is a segment of data from the data payload having a predetermined byte size.
 41. An apparatus comprising: a signature generator, having a connection to a network, to obtain a portion of data from the network, operating to carry out a data reduction on said data portion by carrying out a hash function to reduce said data portion to a reduced data portion in a repeatable manner; a memory, storing said reduced data portions, wherein said signature generator also operates to detect common elements within said reduced data portion, and a content analysis module configured to analyze the common elements using criteria not based on a known vulnerability, to identify a network attack; wherein the content analysis module comprises a correlation module configured to determine whether packets sent in a first interval to a destination address are sent from said destination address in a second interval.
 42. An apparatus as in claim 41, wherein said signature generator determining frequently occurring sections of message information within said reduced data portion.
 43. An apparatus as in claim 42, further comprising a module that carries out an additional test on said frequently occurring sections of message information after said signature generator determines frequently occurring sections of message information.
 44. An apparatus as in claim 43, wherein said additional test is a test to look for an increasing number of at least one of sources and destinations of said frequently occurring sections of message information.
 45. An apparatus as in claim 44, wherein said module is a module to look for code within the frequently occurring sections.
 46. An apparatus as in claim 43, wherein said module: maintains a first list of unassigned addresses in said memory; forms a second list of sources that have sent to addresses on said first list; and compares a current source of a frequently occurring section to said second list.
 47. An apparatus as in claim 46, wherein said module data reduces information prior to storing in said list.
 48. An apparatus as in claim 47, wherein said portion of a network header is a port number indicating a service requested by a network packet.
 49. An apparatus as in claim 46, wherein said module operates to: first monitor a first content sent to a destination; second monitor a second content sent by said destination; and determine a correlation between said first content and said second content as said additional test.
 50. An apparatus as in claim 49, wherein: said first monitoring comprises monitoring multiple destinations; and said second monitoring comprises monitoring multiple destinations during a different time period than said first monitoring.
 51. An apparatus as in claim 46 wherein said portion of data further includes a portion of a network header.
 52. An apparatus as in claim 46, wherein said portion of data comprises a first subset of a network packet including payload and header and wherein said data portion module further obtains a second subset of the same network packet for subsequent analysis.
 53. An apparatus as in claim 52, wherein said data portion module forms a plurality of portions from each network packet, each of said plurality of portions comprising a specified subset of the network packet.
 54. An apparatus as in claim 46, wherein said signature generator operates to form a hash function of at least one of the source or destination address, to form hash values using the hash function, to first determine a unique number of said hash values, and to second determine a number of said one of source or destination addresses based on said unique number from said first determine.
 55. An apparatus as in claim 54, wherein said count carried out by said signature generator further comprises scaling the hash values prior to said second determine.
 56. A apparatus as in claim 55, wherein said scaling comprises scaling by a first value during a first counting session, and scaling by a second value during a second measurement interval.
 57. An apparatus as in claim 42, wherein said determining frequently occurring sections is done by using at least first, second and third data reduction techniques on each said portion, to obtain first, second and third results, and to count said first, second and third results, and to establish frequently occurring sections when all of said first second and third results have a frequency of occurrence greater than a specified amount.
 58. An apparatus as in claim 41, wherein said memory stores information indicative of at least one of a number of sources sending the common content, and/or destinations that are receiving the common content, and said signature generator determines whether said number is increasing.
 59. An apparatus as in claim 41, wherein said signature generator also analyzes for the presence of a specified type of code within said data portion.
 60. An apparatus as in claim 41, further comprising forming a plurality of portions from each network packet, each of said plurality of portions comprising a continuous portion of payload, and information indicative of a port number requested by a network packet.
 61. An apparatus as in claim 41, further comprising: first and second hash generators, respectively forming first and second hash functions of said portions; a first counter, with a plurality of stages, connected such that respective stages of said counter are incremented based on said first hash function; a second counter, with a plurality of stages, and connected such that respective stages of said counter are incremented based on said first hash function.
 62. An apparatus as in claim 61, further comprising a module that checks said one of said stages of said first counter and said one of said stages of said second counter against a threshold, and identifies said portion as frequent content only when both said one of said stages of said first counter and said one of said stages of said second counter are both above said threshold.
 63. An apparatus as in claim 62, further comprising a frequent content buffer table storing specified frequent content.
 64. An apparatus as in claim 63, further comprising at least a third counter, and a third hash generator, taking a third hash of said portion, and incrementing a stage of said third counter based on said third hash, where said module identifies said portion as frequent content only when all of said stages of each of said first, second and third counters are each above said threshold.
 65. An apparatus as in claim 64, wherein said signature generator includes a sliding window portion that first obtains said portion by taking a first part of the message, and subsequently obtains said portion by taking a second part of the message.
 66. A apparatus as in claim 65, wherein at least one of said hash functions is an incremental hash function.
 67. An apparatus as in claim 41, wherein said memory stores a list of computers on the network, and stores an update level for each of said computers indicating which of said computers is susceptible to a specified kind of attack, and a module which monitors for said kind of attack only when the message is directed for a computer which is susceptible to said kind of attack.
 68. An apparatus of claim 67 where said module checks comprises checking for a message that attempts to exploit a known vulnerability to which a computer is vulnerable, as said specified attack.
 69. An apparatus as in claim 68, wherein said module checks for a field that is longer than a specified length. 